Kalman filter control in the reinforcement learning framework
نویسندگان
چکیده
There is a growing interest in using Kalman-filter models in brain modelling. In turn, it is of considerable importance to make Kalman-filters amenable for reinforcement learning. In the usual formulation of optimal control it is computed off-line by solving a backward recursion. In this technical note we show that slight modification of the linear-quadratic-Gaussian Kalman-filter model allows the on-line estimation of optimal control and makes the bridge to reinforcement learning. Moreover, the learning rule for value estimation assumes a Hebbian form weighted by the error of the value estimation. 1. Motivation Kalman filters and their various extensions are well studied and widely applied tools in both state estimation and control. Recently, there is an increasing interest in Kalman-filters or Kalman-filter like structures as models for neurobiological substrates. It has been suggested that Kalman-filtering (i) may occur at sensory processing [6, 7], (ii) may be the underlying computation of the hippocampus, and may be the underlying principle in control architectures [8, 9]. Detailed architectural similarities between Kalman-filter and the entorhinal-hippocampal loop as well as between Kalman-filters and the neocortical hierarchy have been described recently [2, 3]. Interplay between the dynamics of Kalman-filter-like architectures and learning of parameters of neuronal networks has promising aspects for explaining known and puzzling phenomena, such as priming, repetition suppression and categorization [4, 1]. As it is well known, Kalman-filter provides an on-line estimation of the state of the system. On the other hand, optimal control cannot be computed on-line, because it is typically given by a backward recursion (the Ricatti-equations). For on-line parameter estimations without control aspects, see [5]. The aim of this paper is to derive an on-line control method for the Kalman-filter and achieve optimal performance asymptotically. Slight modification of the linearquadratic-Gaussian (LQG) Kalman-filter model is introduced for treating the LQG model as a reinforcement learning (RL) problem. 2. the Kalman filter and the LQG model Consider a linear dynamical system with state xt ∈ R, control ut ∈ R, observation yt ∈ R, noises wt ∈ R and et ∈ R (which are assumed to be Gaussian
منابع مشابه
Kalman Filter Control Embedded into the Reinforcement Learning Framework
There is a growing interest in using Kalman filter models in brain modeling. The question arises whether Kalman filter models can be used on-line not only for estimation but for control. The usual method of optimal control of Kalman filter makes use of off-line backward recursion, which is not satisfactory for this purpose. Here, it is shown that a slight modification of the linear-quadratic-ga...
متن کاملTime Delay and Data Dropout Compensation in Networked Control Systems Using Extended Kalman Filter
In networked control systems, time delay and data dropout can degrade the performance of the control system and even destabilize the system. In the present paper, the Extended Kalman filter is employed to compensate the effects of time delay and data dropout in feedforward and feedback paths of networked control systems. In the proposed method, the extended Kalman filter is used as an observer ...
متن کاملKalman Temporal Differences
Because reinforcement learning suffers from a lack of scalability, online value (and Q-) function approximation has received increasing interest this last decade. This contribution introduces a novel approximation scheme, namely the Kalman Temporal Differences (KTD) framework, that exhibits the following features: sample-efficiency, non-linear approximation, non-stationarity handling and uncert...
متن کاملEmergence of Game Strategy in Multiagent Systems
In this thesis we focused on subsymbolic approach to machine game play problem. We worked on two different methods of learning. Our first goal was to test the ability of common feed-forward neural networks and the mixture of expert topology. We have derived reinforcement learning algorithm for mixture of expert network topology. This topology is capable to split the problem into smaller parts, ...
متن کاملKalman filtering & colored noises: the (autoregressive) moving-average case
The Kalman filter is a well-known and efficient recursive algorithm that estimates the state of a dynamic system from a series of indirect and noisy observations of this state. Its applications range from signal processing to machine learning, through speech processing or computer vision. The underlying model usually assumes white noises. Extensions to colored autoregressive (AR) noise model ar...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره cs.LG/0301007 شماره
صفحات -
تاریخ انتشار 2003